Skip to content

Conversation

wanx7130
Copy link

@wanx7130 wanx7130 commented Oct 6, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link

github-actions bot commented Oct 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation ci/build labels Oct 6, 2025
@wanx7130 wanx7130 closed this Oct 6, 2025
@mergify mergify bot added v1 tpu Related to Google TPUs labels Oct 6, 2025
Copy link

mergify bot commented Oct 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wanx7130.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant changes to enable v0 execution on CPU, XPU, and TPU backends, including new worker implementations, model runners, and attention backends. It also includes refactoring of existing backends for better consistency and adds some debug logging. My review found a critical issue in the new CPU MLA backend implementation that would likely cause runtime errors.

slot_mapping=slot_mapping,
multi_modal_placeholder_index_maps=placeholder_index_maps,
enable_kv_scales_calculation=False,
input_positions=torch.tensor([self.input_data.input_positions]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The input_positions tensor is being created as a 2D tensor by wrapping self.input_data.input_positions in an extra list. The rotary embedding layer expects a 1D tensor for positions. This will likely lead to an indexing error or incorrect behavior at runtime.

Suggested change
input_positions=torch.tensor([self.input_data.input_positions]))
input_positions=torch.tensor(self.input_data.input_positions))

Copy link

mergify bot commented Oct 6, 2025

⚠️ The sha of the head commit of this PR conflicts with #26284. Mergify cannot evaluate rules on this PR. ⚠️

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +159 to +169
if self.runner.attn_backend is not None:
# spec decode (e.g. Medusa) does not have atten backend
attn_backend = self.runner.attn_backend
self.att_metadata_builder = attn_backend.get_builder_cls()(self)

def prepare(self,
finished_requests_ids: Optional[List[str]] = None) -> None:
self.seq_group_metadata_list: List[SequenceGroupMetadata] = []
self.input_data = ModelInputForCPUBuilder.ModelInputData(
self.runner.model_config.uses_mrope)
self.att_metadata_builder.prepare()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard missing when no attention backend is present

In ModelInputForCPUBuilder.prepare() the code unconditionally calls self.att_metadata_builder.prepare(), but self.att_metadata_builder is only created when self.runner.attn_backend is not None (see the comment about speculative decoding having no attention backend). When the runner is used for configurations that do not build an attention backend, prepare() will raise an AttributeError before any inputs are built, breaking CPU speculative decoding. Add a null check before invoking the builder or skip metadata preparation when no attention backend exists.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation kv-connector needs-rebase tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants